This file contains Problem 2, which is the coding portion of Homework 4. Your job is to implement/modify the sections within this notebook that are marked with "TO DO".
This is an individual assignment, not a team project. Please review the Honor Code statement in the syllabus.
Your goal is to implement the KLT tracking tracking algorithm. Write your code without using OpenCV functions, except for a few that are indicated in the instructions below. Approved OpenCV functions tend to be for low-level operations such as file I/O, image format manipulation (e.g., color/grayscale conversion), and highlighting image locations by drawing geometric shapes (e.g., circles).
You may use any code that you have written for previous assignments, such as your linear_filter function and edge-detection code from HW2. Basic math functions from Python/NumPy are also allowed.
# Mount your Google Drive to this notebook
# The purpose is to allow your code to access to your files
from google.colab import drive
drive.mount('/content/drive')
# Change the directory to your own working directory
# Any files under your working directory are available to your code
# TO DO: enter the name of your directory
import os
os.chdir('/content/drive/MyDrive/Comp Vision')
# Import library modules
import sys
import cv2
import numpy as np
import matplotlib.pyplot as plt
# PIL is the Python Imaging Library
from PIL import Image
# The following is a substitute for cv2.imshow, which Colab does not allow
from google.colab.patches import cv2_imshow
print('Python version:', sys.version)
print('OpenCV version:', cv2.__version__)
print('NumPy version: ', np.__version__)
You have been given a folder that contains the "hotel" image sequence. This sequence is a set of 51 images: hotel.seq00.png, ..., hotel.seq50.png. If you display them in quick succession you will see a toy hotel undergo rotation and a small amount of translation. Keep these files inside the folder hotel_images, and upload that folder to your working directory at Colab.
Use the following code block to verify a correct upload, and to illustrate keypoint detection. This example finds Shi-Tomasi feature points in image 0 of the sequence, and indicates their locations by small green circles.
# The purpose of this code block is to verify that you can access the hotel
# images, and to illustrate examples of keypoints.
# Parts of this example were borrowed from
# https://docs.opencv.org/4.x/d4/d8c/tutorial_py_shi_tomasi.html
DISPLAY_RADIUS = 3
DISPLAY_COLOR = (0, 255, 0)
def keypointDetectionDemo(im0):
# find image locations that may be good for tracking
feature_params = dict( maxCorners = 300,
qualityLevel = 0.2,
minDistance = 7,
blockSize = 5 )
p0 = cv2.goodFeaturesToTrack(im0, mask = None, **feature_params)
# now corners should contain an array of (floating point) pixel locations
if p0 is None:
print("no keypoints were found!")
return
print (f'Number of detected keypoints = {p0.shape[0]}')
# convert to kx2 format, where k is the number of feature points
corners = np.zeros((p0.shape[0],2))
for i in range(corners.shape[0]):
corners[i] = p0[i][0]
# draw a small circle at each detected point and display the result
im0color = cv2.cvtColor(im0, cv2.COLOR_GRAY2BGR)
cornersInt = np.intp(np.round(corners)) # convert to integers used for indexing
for i in cornersInt:
x, y = i.ravel() # returns a contiguous flattened array
cv2.circle(im0color, (x, y), DISPLAY_RADIUS, DISPLAY_COLOR)
cv2_imshow(im0color)
return
# load and display a sample image, detect features, and display the results
im0 = cv2.imread("hotel_images/hotel.seq00.png", cv2.COLOR_BGR2GRAY)
keypointDetectionDemo(im0)
Apply the Kanade-Lucas-Tomasi tracking procedure to track the keypoints that were detected in the previous code example. Those keypoints were detected in image 0 of the hotel sequence, and the goal is to track them throughout images 1 to 50.
The KLT procedure operates on a pair of images that were captured at times t and t+1. For a given keypoint at (x, y) at time t, the procedure tries to find the new location (x', y') of the same keypoint at time t+1. For reference, the following pseudocode is from the lecture slides.
The KLT tracking procedure assumes small movements of keypoints from t to t+1. For this reason, tracking needs to be performed at subpixel resolution. Keypoint locations therefore need to be maintained using floating point values.
The matrix equation depends on spatial gradients, which can be noisy. For this reason, you should perform smoothing of the spatial gradients when computing these matrices.
You are allowed to use the OpenCV function getRectSubPix, which takes an image location at subpixel resolution, and interpolates neighboring pixel values to return a small region of interest from the image. For computing the summations in the pseudocode, it is suggested that you use a window size of 15x15 surrounding the keypoint. Some useful functions here are np.sum, np.multiply, etc.
The first function that you should implement is getNextPoints in the next code block. It should apply the KLT procedure using a pair of images for a given collection of feature points. For each feature point in an image at time t, the procedure tries to find the location of the corresponding point in an image at time t+1.
It is likely that some of the keypoints will eventually move out of the image frame as they are tracked over the entire sequence. To handle these situations in a simple way, it is suggested that you maintain a movedOutFlag vector of True/False values. If an element in movedOutFlag is False, then the associated keypoint is inside the image frame and should be processed; but if True, then getNextPoints should ignore that keypoint.
def linear_filter(img_in, kernel):
shape = kernel.shape
buf = int(shape[0]/2)
img_out = img_in.copy()
for i in range (buf,img_in.shape[0]-shape[0]-1):
for j in range (buf,img_in.shape[1]-shape[1]-1):
sum = float(0)
for a in range(shape[0]):
for b in range(shape[1]):
sum = sum + (img_in[i+a,j+b]*kernel[a,b])
img_out[i+buf,j+buf] =min(255,abs(sum))
return img_out # Each pixel must be of type np.float32
def getNextPoints(im1, im2, xy, movedOutFlag):
'''Track keypoints from image im1 to image im2
Input:
im1: grayscale image at time t; shape (m, n)
im2: grayscale image at time t+1; shape (m, n)
xy: a numpy array of size kx2, where k is the number of keypoints
Each keypoint is of the form [x, y], with both in floating-point format
Output:
xy2: updated keypoint locations; same format as xy input
movedOutFlag: array of True/False values, of size kx1,
to indicate whether each associated
keypoint has moved outside the dimensions of the image array
TO DO: Implement the getNextPoints function.
'''
xy2 = np.copy(xy)
movedOutFlag = np.zeros(xy.shape[0],dtype=np.bool)
#smoothing the image using gaussian blur
imageGauss1 = cv2.GaussianBlur(im1,(7,7),0)
imageGauss2 = cv2.GaussianBlur(im2,(7,7),0)
RectWindow = (15,15)
for i in range(xy.shape[0]):
x = xy[i,0]
y = xy[i,1]
P = cv2.getRectSubPix(imageGauss1,RectWindow,(x , y)).astype(np.float)
P_x = cv2.getRectSubPix(imageGauss1,RectWindow,(x-1 , y)).astype(np.float)
P_y = cv2.getRectSubPix(imageGauss1,RectWindow,(x , y-1)).astype(np.float)
#getting the I x and y values
I_x = cv2.getRectSubPix(imageGauss1,RectWindow,(x+1, y)).astype(np.float) - P_x
I_y = cv2.getRectSubPix(imageGauss1,RectWindow,(x , y+1)).astype(np.float)- P_y
#getting the I x^2, y^2 and xy values
I_x_2 = np.sum(np.sum(np.multiply(I_x,I_x).astype(np.float),0))
I_y_2 = np.sum(np.sum(np.multiply(I_y,I_y).astype(np.float),0))
I_x_y = np.sum(np.sum(np.multiply(I_x,I_y).astype(np.float),0))
#forming the LHS matrix
lhs = np.array([[I_x_2,I_x_y],[I_x_y,I_y_2]], dtype=np.int)
uv_previous = np.array([x,y])
#finding the lhs transpose
if(np.linalg.det(lhs)!=0):
lhs_T = np.linalg.inv(lhs)
numOfIteration=0
#iterating it for 30 itmes so as to not go into a infinite loop
while(numOfIteration<30):
It = cv2.getRectSubPix(imageGauss2,RectWindow,(x,y)) - P
I_x_t = np.sum(np.sum(np.multiply(I_x,It).astype(np.float),0))
I_y_t = np.sum(np.sum(np.multiply(I_y,It).astype(np.float),0))
numOfIteration+= 1
#finding the u,v by doing rhs*lhs^T
uv = np.matmul(lhs_T,-np.array([I_x_t,I_y_t]))
xy2[i] = (x,y)
y = y+uv[1]
x = x+uv[0]
#to check of the point has moved if so continue
if(uv[0]==uv_previous[0] and uv[1]==uv_previous[1]):
break
uv_previous[1] = uv[1]
uv_previous[0] = uv[0]
#checking if the point has moves outside the output image
if np.all((0<=y<=im1.shape[1]) and (0<=x<=im1.shape[0])):
movedOutFlag[i] = False
else:
movedOutFlag[i] = True
return ( xy2, movedOutFlag )
Use the following code block to test your getNextPoints function using only a few keypoints. If you would like to add more tests, place the new code after the code that is already present. If your code is working correctly, then this test should display the keypoints at correct locations in both images.
# Test getNextPoints with only a few keypoints, and with only one pair of images
im1 = cv2.imread("hotel_images/hotel.seq00.png", cv2.COLOR_BGR2GRAY)
im2 = cv2.imread("hotel_images/hotel.seq01.png", cv2.COLOR_BGR2GRAY)
# a few example keypoints
xy = np.array([[192., 263.], [245., 281.], [ 31., 414.]])
# initialize this flag to False, to indicate that all keypoints are inside the image
movedOutFlag = np.zeros(xy.shape[0])
xy2, movedOutFlag = getNextPoints(im1, im2, xy, movedOutFlag)
print (f'initial keypoints = {xy}')
print (f'updated keypoints = {xy2}')
print (f'updated flag = {movedOutFlag}')
# For both images, draw a small circle at each keypoint and display the result
DISPLAY_RADIUS = 3
DISPLAY_COLOR = (0, 255, 0)
im1color = cv2.cvtColor(im1, cv2.COLOR_GRAY2BGR)
corners = np.intp(np.round(xy))
for i in corners:
x, y = i.ravel()
cv2.circle(im1color, (x, y), DISPLAY_RADIUS, DISPLAY_COLOR)
cv2_imshow(im1color)
im2color = cv2.cvtColor(im2, cv2.COLOR_GRAY2BGR)
corners = np.intp(np.round(xy2))
for i in corners:
x, y = i.ravel()
cv2.circle(im2color, (x, y), DISPLAY_RADIUS, DISPLAY_COLOR)
cv2_imshow(im2color)
Now that you have tested your tracking code with on a small scale, it is time to work with more keypoints and longer image sequences.
The next code block provides a suggested structure for the overall tracking solution. The top-level function is mainFunction, which should generate the required results after you have finished. Examine this top-level function to see how the other functions fit together. Read the comment blocks for more detailed descriptions.
Write additional code to complete the the next code block. The only parts that you should need to update are marked "TO DO". The grader should only need to run mainFunction in order to generate and see your final KLT tracking results. Write the code so that it automatically displays the following:
(You are allowed to make further modifications to the code. But in those cases, please provide detailed comments to explain your reasons for making the changes.)
# KLT TRACKING - main code block
# global variables - try to write you code without needing additional globals
#NUMBER_OF_IMAGES = 51 # try smaller values here for initial testing
NUMBER_OF_IMAGES = 51
DISPLAY_RADIUS = 3
GREEN = (0, 255, 0)
YELLOW = (0, 255, 255)
def mainFunction():
'''This is the main "driver" function that performs KLT tracking
'''
print("In mainFunction")
allImgs = readImages(NUMBER_OF_IMAGES)
print (f'number of images that were read = {len(allImgs)}')
# get initial keypoints from image 0
image0 = allImgs[0]
xy = getKeypoints(image0)
if xy is None:
print("no points to track!")
return
print (f'number of detected keypoints = {xy.shape[0]}')
# display keypoints for image 0
image0color = cv2.cvtColor(image0, cv2.COLOR_GRAY2BGR)
corners = np.intp(np.round(xy))
for i in corners:
x, y = i.ravel()
cv2.circle(image0color, (x, y), DISPLAY_RADIUS, GREEN)
# track the initial keypoints through all remaining images
xyt = trackPoints(xy, allImgs)
# in image 0, draw the paths taken by the keypoints
drawPaths(image0color, xyt)
return
def readImages(filecount):
'''Read a sequence of image files, starting with image 0 in 'hotel' sequence
Input:
filecount: how many image files to read
Output:
allImages: a list of OpenCV images in sequential order
'''
print("In function readImages")
allImages = []
for i in range(filecount):
print (f'reading image {i:02}')
imagetmp = cv2.imread("hotel_images/hotel.seq" + f'{i:02d}' + ".png", cv2.COLOR_BGR2GRAY)
allImages.append(imagetmp)
return allImages
def getKeypoints(im0):
'''Find keypoints that will be good for tracking;
you are allowed to copy code directly from keypointDetectionDemo
Input:
im0: grayscale source image with shape (m, n)
Output:
corners: a numpy array of size kx2, where k is the number of keypoints.
Each keypoint is of the form [x, y], where x and y are floating-point
values that represent image locations in the (horizontal, vertical)
directions, respectively.
'''
print ("In function getKeypoints")
# replace the following line; it is just a placeholder
feature_params = dict( maxCorners = 300,
qualityLevel = 0.2,
minDistance = 7,
blockSize = 5 )
p0 = cv2.goodFeaturesToTrack(im0, mask = None, **feature_params)
# now corners should contain an array of (floating point) pixel locations
if p0 is None:
print("no keypoints were found!")
return
print (f'Number of detected keypoints = {p0.shape[0]}')
# convert to kx2 format, where k is the number of feature points
corners = np.zeros((p0.shape[0],2))
for i in range(corners.shape[0]):
corners[i] = p0[i][0]
return corners
def trackPoints(xy, imageSequence):
'''Track keypoints through the given image sequence
Input:
xy: a numpy array containing keypoints for the first image in imageSequence;
format is identical to 'corners' in getKeypoints
imageSequence: a list of OpenCV images in sequential order
Output:
xyt: <any format that you see fit, as needed by the drawPaths function>
TO DO: Update the trackPoints function
'''
print ("In function trackPoints")
print (f'length of imageSequence = {len(imageSequence)}')
movedOutFlag = np.zeros(xy.shape[0])
# initialize xyt to contain any information that is needed for drawing paths at the end of tracking
# also add code in this function as needed to maintain xyt
xyt = []
for t in range(0, len(imageSequence)-1): # predict for all images except first in sequence
print (f't = {t}; predicting for t = {t+1}')
xy2, movedOutFlag = getNextPoints(imageSequence[t], imageSequence[t+1], xy, movedOutFlag)
xy = xy2
corners = np.intp(np.round(xy2))
for c in range(0, corners.shape[0]):
if movedOutFlag[c] == False:
y = corners[c][1]
x = corners[c][0]
xyt.append([x,y])
# for selected instants in time, display the latest image with highlighted keypoints
if ((t == 0) or (t == 10) or (t == 20) or (t == 30) or (t == 40) or (t == 49)):
im2color = cv2.cvtColor(imageSequence[t+1], cv2.COLOR_GRAY2BGR)
corners = np.intp(np.round(xy2))
for c in range(0, corners.shape[0]):
if movedOutFlag[c] == False:
x = corners[c][0]
y = corners[c][1]
cv2.circle(im2color, (x, y), DISPLAY_RADIUS, GREEN)
cv2_imshow(im2color)
return xyt
def drawPaths(im0color, xyt):
'''In the given image, draw paths that were taken by each keypoint during tracking
Input:
im0color: a color image with shape (m, n, 3), typically the first image in a sequence
Output:
xyt: <any format that you see fit, as needed for drawing the paths>
TO DO: Implement the drawPaths function
'''
print ("In function drawPaths")
for i in range(len(xyt)-1):
try:
im0color[xyt[i][1], xyt[i][0], :] = YELLOW
except:
pass
print ("FINISHED: here are the paths of the tracked keypoints")
cv2_imshow(im0color)
The grader should only need to run mainFunction in the next block in order to run your code and see the required output.
# Run the KLT tracking code
mainFunction()
Here is an example output from drawPaths. With close examination, you can see that it is not perfect for all keypoints. It is okay if your result is not perfect, either. (But try to make your output at least this good.)
def getKeypoints(ImageInput):
print ("In self implemented getKeypoints function")
edges = []
ImageInput = cv2.GaussianBlur(ImageInput, (5,5), 0.2)
I_y, I_x = np.gradient(ImageInput)
I_x_x = I_x*I_x
I_y_y = I_y*I_y
I_x_y = I_x*I_y
k = 0.05
negate = 2
height, width = I_x_x.shape[0], I_x_x.shape[1]
R = np.zeros((height, width))
for y in range(negate, height-negate):
for x in range(negate, width-negate):
S_x_x = np.sum(I_x_x[y-negate:y+1+negate, x-negate:x+1+negate])
S_y_y = np.sum(I_y_y[y-negate:y+1+negate, x-negate:x+1+negate])
S_x_y = np.sum(I_x_y[y-negate:y+1+negate, x-negate:x+1+negate])
detH = (S_x_x * S_y_y) - (S_x_y**2)
traceH = S_x_x + S_y_y
R[y,x] = detH - k * traceH**2
for y in range(negate, height-negate, 5):
for x in range(negate, width-negate, 5):
r = np.max(R[y-negate:y+negate, x-negate:x+negate])
if r > 4*1e6:
edges.append([x, y])
edges = np.array(edges)
edges = np.intp(np.round(edges))
edges = np.float32(edges)
return edges
mainFunction()
# TO DO: Provide the full path to your Jupyter notebook file
!jupyter nbconvert --to html "/content/drive/MyDrive/Comp Vision/Homework4_USERNAME.ipynb"